A Heuristic on Effective and Efficient Clustering on Uncertain Objects
نویسندگان
چکیده
We study the problem of clustering uncertain objects whose locations are uncertain and described by probability density functions. We analyze existing pruning algorithms and experimentally show that there exists a new bottleneck in the performance due to the overhead while pruning candidate clusters for assignment of each uncertain object in each iteration. We further show that by considering squared Euclidean distance, UK-means (without pruning techniques) is reduced to K-means and performs much faster than pruning algorithms, however, with some discrepancies in the clustering results due to the different distance functions used. Thus, we propose Approximate UK-means to heuristically identify objects of boundary cases and re-assign them to better clusters. Our experimental results show that on average the execution time of Approximate UK-means is only 25% more than K-means and our approach reduces the discrepancies of K-means’ clustering results by more than 70% at most.
منابع مشابه
A heuristic approach to effective and efficient clustering on uncertain objects
We study the problem of clustering uncertain objects whose locations are uncertain and described by probability density functions (pdf). We analyze existing pruning algorithms and experimentally show that there exists a new bottleneck in the performance due to the overhead of pruning candidate clusters for assignment of each uncertain object in each iteration. In this article, we will show that...
متن کاملA Clustering Approach by SSPCO Optimization Algorithm Based on Chaotic Initial Population
Assigning a set of objects to groups such that objects in one group or cluster are more similar to each other than the other clusters’ objects is the main task of clustering analysis. SSPCO optimization algorithm is anew optimization algorithm that is inspired by the behavior of a type of bird called see-see partridge. One of the things that smart algorithms are applied to solve is the problem ...
متن کاملTechnique For Clustering Uncertain Data Based On Probability Distribution Similarity
: Clustering on uncertain data, one of the essential tasks in data mining. The traditional algorithms like K-Means clustering, UK Means clustering, density based clustering etc, to cluster uncertain data are limited to using geometric distance based similarity measures and cannot capture the difference between uncertain data with their distributions. Such methods cannot handle uncertain objects...
متن کاملClustering and Classification on Uncertain Data
We study the problem of mining on uncertain objects whose locations are uncertain and described by probability density functions (pdf). Clustering and classification are two important tasks in data mining. Clustering on uncertain objects is different from traditional case on certain objects. UK-means is proposed based on K-means but it is time consuming. Pruning techniques are proposed to impro...
متن کاملSubspace Clustering for Uncertain Data
Analyzing uncertain databases is a challenge in data mining research. Usually, data mining methods rely on precise values. In scenarios where uncertain values occur, e.g. due to noisy sensor readings, these algorithms cannot deliver highquality patterns. Beside uncertainty, data mining methods face another problem: high dimensional data. For finding object groupings with locally relevant dimens...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2010